Using old Spam and Ham Samples to Train Email Filters
نویسندگان
چکیده
Email spam filters are commonly trained on a sample of recent spam and ham (non-spam) messages. We investigate the effect on filter performance of using samples of spam and ham messages sent months before those to be filtered. Our results show that filter performance deteriorates with the overall age of spam and ham samples, but at different rates. Spam and ham samples of different ages may be mixed to advantage, provided temporal cues are elided.
منابع مشابه
Classifying Unsolicited Bulk Email (UBE) using Python Machine Learning Techniques
Email has become one of the fastest and most economical forms of communication. However, the increase of email users has resulted in the dramatic increase of spam emails during the past few years. As spammers always try to find a way to evade existing filters, new filters need to be developed to catch spam. Generally, the main tool for email filtering is based on text classification. A classifi...
متن کاملA New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection
Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...
متن کاملA Survey on Various Classifiers Detecting Gratuitous Email Spamming
Email becomes the major source of communication these days. Most humans on the earth use email for their personal or professional use. Email is an effective, faster and cheaper way of communication. The importance and usage for the email is growing day by day. It provides a way to easily transfer information globally with the help of internet. Due to it the email spamming is increasing day by d...
متن کاملCollaborative Spam Filtering with the Hashing Trick
User feedback is vital to the quality of the collaborative spam fi lters frequently used in open membership email systems such as Yahoo Mail or Gmail. Users occasionally designate emails as spam or non-spam (often termed as ham), and these labels are subsequently used to train the spam fi lter. Although the majority of users provide very little data, as a collective the amount of training data ...
متن کاملSpam Filtering with Naive Bayes - Which Naive Bayes?
Naive Bayes is very popular in commercial and open-source anti-spam e-mail filters. There are, however, several forms of Naive Bayes, something the anti-spam literature does not always acknowledge. We discuss five different versions of Naive Bayes, and compare them on six new, non-encoded datasets, that contain ham messages of particular Enron users and fresh spam messages. The new datasets, wh...
متن کامل